bm/perf-eliminate-redundant-parsing by datastx · Pull Request #91 · datastx/Feather-Flow

datastx · 2026-02-23T06:55:57Z

Summary

Eliminate YAML double-parse (31.8% of heap): process_node_dir() already reads YAML to probe the kind field, then discards content. Added load_from_str() variants to ModelSchema, SourceFile, and FunctionDef so loaders reuse the already-read content instead of re-reading from disk and re-parsing.
Incremental FeatherFlowProvider (11.6% of heap): propagate_schemas() was rebuilding a new FeatherFlowProvider (converting all Arrow schemas) for every model in topo order — O(N²). Now builds once before the loop and calls insert_schema() incrementally — O(N).
Eliminate SQL double-parse in qualify (9.1% of heap): qualify_table_references() re-parsed SQL that was already parsed in compile_model_phase1(). Added qualify_statements() that operates directly on the existing AST. CompileOutput now carries parsed statements through to the qualification phase.

Benchmark Results

Benchmark	Change
project_load	-13.6%
dag_topological_sort	-10.1%
propagate_schemas_small	-5.2%
parse_complex_join	-5.7%
qualify_table_references	-4.5%
dag_build	-5.0%

All improvements confirmed statistically significant by criterion (p < 0.05).

Test plan

make test — all 1,092 tests pass
make bench — criterion benchmarks confirm improvements
make ci-e2e — end-to-end test harness
make profile-memory — verify heap reduction via dhat

🤖 Generated with Claude Code

Profiling showed 53% of heap allocations (6.7 MB / 12.7 MB) came from three redundant parse operations. This commit eliminates all three: 1. YAML double-parse (31.8% of heap): process_node_dir() already reads YAML to probe the kind field, then discards it. Loaders re-read and re-parse the same file. Fix: add load_from_str() variants and pass the already-read content through to loaders. 2. FeatherFlowProvider rebuilt per model (11.6%): propagate_schemas() constructed a new provider for every model in topo order, rebuilding Arrow schema maps from scratch each time (O(N²)). Fix: build once before the loop, incrementally insert_schema() after each model. 3. SQL double-parse in qualify (9.1%): qualify_table_references() re-parses SQL that was already parsed in compile_model_phase1(). Fix: add qualify_statements() that operates on the existing AST, store parsed statements in CompileOutput. Benchmarks: project_load -13.6%, propagate_schemas -5.2%, qualify_table_references -4.5%, dag_topological_sort -10.1%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

datastx merged commit c3f02ce into main Feb 23, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

bm/perf-eliminate-redundant-parsing#91

bm/perf-eliminate-redundant-parsing#91
datastx merged 1 commit intomainfrom
bm/perf-eliminate-redundant-parsing

datastx commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

datastx commented Feb 23, 2026

Summary

Benchmark Results

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant